svm: Avoid VMSAVE/VMLOAD/VMSAVE/VMLOAD sequence on every vmexit/vmentry.
Instead do this only on context switches. In cases where we need
access to state that is only saved to the VMCB on VMSAVE, we track
whether the state is in sync via a per-vcpu flag and VMSAVE on demand.
The context switch code can be further improved:
1. No need to VMLOAD host state if we are switching to another SVM VCPU.
2. No need to VMSAVE host state at all (except once at start of day)
because the registers that are saved do not change (or at least, none
of the ones that matter change).
The performance is improvement is about 650 cycles for a null
hypercall. This reduces the total null-hypercall time for a non-debug
build of Xen down to around 3300 cycles on my AMD X2 system.
Signed-off-by: Keir Fraser <keir@xensource.com>